Tbl2KnownGene: A command-line program to convert NCBI.tbl to UCSC knownGene.txt data file
نویسنده
چکیده
UNLABELLED The schema for UCSC Known Genes (knownGene.txt) has been widely adopted for use in both standard and custom downstream analysis tools/scripts. For many popular model organisms (e.g. Arabidopsis), sequence and annotation data tables (including "knownGene.txt") have not yet been made available to the public. Therefore, it is of interest to describe Tbl2KnownGene, a .tbl file parser that can process the contents of a NCBI .tbl file and produce a UCSC Known Genes annotation feature table. The algorithm is tested with chromosome datasets from Arabidopsis genome (TAIR10). The Tbl2KnownGene parser finds utility for data with other organisms having similar .tbl annotations. AVAILABILITY Perl scripts and required input files are available on the web at http://thoth.indstate.edu/~ybai2/Tbl2KnownGene/ index.html.
منابع مشابه
SNPAAMapperT2K: A genome-wide SNP downstream analysis and annotation pipeline for species annotated with NCBI.tbl data files
UNLABELLED SNPAAMapper, a genome-wide SNP downstream analysis and annotation pipeline, was designed to classify detected variants according to genomic regions and report the mutation class by processing whole-genome and/or whole-exome sequencing data. A widely used sequence and data annotation table format "knownGene.txt" has not yet been created for many popular model organisms (e.g. Arabidops...
متن کاملImage Operations using a Semi-compressed Contour Tree Image Definition
The contour tree file format has been used for a few years as a suitable storage format for most image types. The technique stores unique regions into a hierarchical data structure which defines the complete raster image. This data structure is called a contour tree. It compares very favourable with other lossless coding schemes on all image format types including, bi-level and reduced colour i...
متن کاملPGDSpider: an automated data conversion tool for connecting population genetics and genomics programs
UNLABELLED The analysis of genetic data often requires a combination of several approaches using different and sometimes incompatible programs. In order to facilitate data exchange and file conversions between population genetics programs, we introduce PGDSpider, a Java program that can read 27 different file formats and export data into 29, partially overlapping, other file formats. The PGDSpi...
متن کاملXtriage and Fest: automatic assessment of X-ray data and substructure structure factor estimation
Xtriage A command line utility that allows the user to rapidly assess the quality and specific idiosyncrasies of an X-ray dataset has been developed. The program, called Xtriage, combines the twin analyses tools as described in a previous CCP4 newsletter (Zwart, et al., 2005) with other data quality indicators. In the following sections, the various steps in the characterization of an X-ray dat...
متن کاملrat: A Secure Archiving Program With Fast Retrieval
A new archive format called rat was developed. This format was designed to allow very fast retrieval of individual files. This is achieved using a table of contents to quickly find the file. Each file in the archive is individually compressed with a compression method specific to the file. A user created configuration file is used to specify what type of compression to use on each file based on...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 10 شماره
صفحات -
تاریخ انتشار 2014